CASSANALYTICS-151: Cannot read start offset from BTI with big partitions by lukasz-antoniak · Pull Request #199 · apache/cassandra-analytics

lukasz-antoniak · 2026-04-17T19:50:26Z

…artitions

michaelsembwever

+1 tested, fixes the issue

yifan-c · 2026-04-20T05:24:54Z

        try
        {
-            withPartitionIndex(ssTable, descriptor, metadata, true, false, (dataFileHandle, partitionFileHandle, rowFileHandle, partitionIndex) -> {
+            withPartitionIndex(ssTable, descriptor, metadata, true, true, (dataFileHandle, partitionFileHandle, rowFileHandle, partitionIndex) -> {


For simplicity, I think loadDataFile and loadRowsIndex are always true. Can we simplify the method signature?

In Cassandra's implementation, row index component is always opened.

Good catch, thank you. Code updated.

yifan-c · 2026-04-20T06:02:20Z

+                            {
+                                LOGGER.error("Missing key key={} token={} partitioner={}",
+                                             key,
+                                             toToken(partitioner, index),


It should be toToken(partitioner, i), since i is the partition index. Please rename i too.

Meanwhile, the variable index can be removed.

yifan-c · 2026-04-20T06:02:34Z

+                            {
+                                LOGGER.error("Key read by more than 1 Spark partition key={} token={} partitioner={}",
+                                             key,
+                                             toToken(partitioner, index),


Same, it should be toToken(partitioner, i)

yifan-c · 2026-04-20T06:02:58Z

-                                         partitioner.name());
-                        }
-                        else if (count > 1)
+                        for (int j = 0; j < counts[i].length; j++)


nit: rename j to rowIndexInPartition?

yifan-c · 2026-04-20T06:03:19Z

                    MutableInt skippedPartitions = new MutableInt(0);
                    MutableLong skippedDataOffsets = new MutableLong(0);
-                    int[] counts = new int[numKeys];
+                    int[][] counts = new int[numPartitions][numRowsPerPartition];


nit: rename counts to partitions

I think that this two-dimension array stores really the count of views for each partition and row, so maybe we should leave it as counts?

yifan-c · 2026-04-20T06:09:59Z

Is the change in this file (for 4.0) necessary? The bug fixed is only in the BTI format code path.

My motivation was to keep both classes in-sync unless changes cannot be applied. I have rolled back the change in four-zero bridge now.

CASSANALYTICS-151: Cannot read start offset from BTI index with big p…

96786a5

…artitions

michaelsembwever reviewed Apr 17, 2026

View reviewed changes

Comment thread ...andra-four-zero-bridge/src/test/java/org/apache/cassandra/spark/reader/IndexOffsetTests.java Outdated

michaelsembwever reviewed Apr 17, 2026

View reviewed changes

Comment thread ...andra-four-zero-bridge/src/test/java/org/apache/cassandra/spark/reader/IndexOffsetTests.java Outdated

michaelsembwever reviewed Apr 17, 2026

View reviewed changes

Comment thread ...andra-five-zero-bridge/src/test/java/org/apache/cassandra/spark/reader/IndexOffsetTests.java

michaelsembwever reviewed Apr 17, 2026

View reviewed changes

Comment thread ...andra-five-zero-bridge/src/test/java/org/apache/cassandra/spark/reader/IndexOffsetTests.java

michaelsembwever approved these changes Apr 17, 2026

View reviewed changes

yifan-c reviewed Apr 20, 2026

View reviewed changes

Apply review comments

462c79c

Conversation

lukasz-antoniak commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelsembwever left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants